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Systems biology beyond degree, hubs and scale-free networks: the case 
for multiple metrics in complex networks 
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Abstract Modeling and topological analysis of networks in 
biological and other complex systems, must venture beyond 
the limited consideration of very few network metrics like 
degree, betweenness or assortativity. A proper identification 
of informative and redundant entities from many different 
metrics, using recently demonstrated techniques, is essen- 
tial. A holistic comparison of networks and growth models 
is best achieved only with the use of such methods. 
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Network theory (Albert et al. 2002, Newman 2010) plays 
an important role in Systems Biology. Complex network lit- 
erature is replete with discussions about networks bearing 
knowledge of function, signatures of complexity and infor- 
mation about "emergent properties" of the system being en- 
coded in their topology. It is therefore quite natural to as- 
sume that a comprehensive study of a significant number of 
network metrics would convey a lot of information about the 
system. Interestingly however, most papers in literature an- 
alyze at most two or three metrics at a time. Thus arises a 
very relevant yet unanswered question - do these few hand- 
picked network metrics convey most of the knowledge that 
could have been known about the network? 

The idea that the network topology can be a major deter- 
minant of function (or dysfunction) has been studied in con- 
siderable detail. The relation between the topological prop- 
erties of network nodes (genes, proteins) and functional es- 
sentiality is well known in interaction networks (Albert et 
al. 2000, Jeong et al. 2001). 

In metabolic networks, long before the advent of the 
complex networks era, extensive modeling had been done 
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using steady-state flux balance approaches (Varma et al. 19- 
94) via methods like Flux Balance Analysis (FBA) (Edwards 
et al. 2000), Minimization of Metabolic Adjustment (MO- 
MA) (Segre et al. 2002), and Elementary Mode Analysis 
(EMA) (Steling et al. 2002). Nevertheless, topological anal- 
ysis has often yielded novel and valuable insight, in metabol- 
ic networks. For example, new parameters like synthetic ac- 
cessibility (S A) have exhibited sufficient power in predicting 
the viability of knockout strains with accuracy comparable 
to approches using biochemical parameters (like FBA etc.) 
on large, unbiased mutant data sets (Wunderlich et al. 2006). 
This is especially remarkable since determining S A does not 
require the knowledge of stoichiometry or maximal uptake 
rates for metabolic and transport reactions which might be 
necessary in FBA, MOMA and EMA. Also, it can be rapidly 
computed for a given network and has no adjustable param- 
eters. 

Degree or the number of connections a node has with 
other nodes in the network and sometimes also with itself, is 
the most common topological metric in networks. It is per- 
haps hard to find a paper in complex networks which does 
not mention degree. Degree distributions are generally well- 
studied for almost all systems. Unfortunately, there is still 
a trend of labeling networks possessing heavy-tailed degree 
distributions as scale-free networks, i.e., networks having a 
power-law degree distribution. This is widespread, in spite 
of a reliable statistical machinery for proper identification of 
scale-free networks (Clauset et al., 2009). 

Power-laws have a special place in statistical physics, 
and hence the activity around "scale-free networks" in phys- 
ics literature is somewhat understandable . However, one is 
at a loss to comprehend as to why "scale-free networks" are 
overemphasized in the biological networks community. Es- 
pecially, when many degree distributions could be fit equally 
well or perhaps even better by other distributions. Irrespec- 
tive of whether they obey a power-law or not, all heavy- 
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Fig. 1 An elementary analysis would reveal that targeting a high- 
betweenness node like 7 over a 'hub' (like 3) would cause much more 
damage to the network. 



tailed degree distributions have at least one thing in com- 
mon: hubs or high degree nodes in the network. The over- 
reaching engagement with "hubs" seems to stem from the 
apparent conclusion that removal of these could cause mas- 
sive damage to the network. However, it was shown quite 
sometime ago by means of the "S -metric", that even with a 
scaling degree sequence, extremely important networks like 
the internet could be structurally robust and functionally sta- 
ble (Doyle et al. 2005). Thus, the removal of hubs might not 
necessarily have a catastrophic but merely a local effect. 

There have been a number of works showing that hubs 
are not always the most important nodes. Social scientists 
have known this for a long time via the analysis of graphs 
like the "Krackhardt kite graph" shown in Fig. [T](Wasser- 
man et al. 1994). One of the most important properties of a 
network node is "betweenness centrality", which measures 
the fraction of all shortest paths passing through that node 
(Freeman 1977). In the world air transportation network, the 
common perception would probably be that most shortest 
flights between any two airports are likely to pass through 
cities like London and New York. Actual analysis showed 
that many of the shortest paths did not pass through 60% of 
the 25 most connected airports. Instead many of them passed 
through airports like Anchorage and Port Moresby (Guimera 
et al. 2005). Of course carrying forward this treatment from 
an unweighted to a weighted network might change the re- 
sults somewhat but their general significance is not lost. It is 
also known that maximum damage would be done to the US 
airline network if the airports are targeted by betweenness 
rather than hubness (Wuellner et al. 2010). A number of pa- 



pers using biological networks have found important results 
using betweenness (Dunn et al. 2005, Hahn et al. 2005, Joy 
et al. 2005, Hegde et al. 2008, Liu et al. 2009). 

Assortative mixing (Newman 2002), i.e. whether high- 
degree nodes are connected to other high-degree nodes in a 
network, is also known to be an important consideration in a 
number of biological networks (Bagler et al. 2007, Pechenick 
et al. 2012). While it was earlier thought that all biological 
networks are disassortative, it has been subsequently found 
that protein contact networks could be assortative (Bagler et 
al. 2007). 

It should be mentioned here that various measures from 
spectral graph theory are known to shed valuable insight in 
graphs and have also been studied extensively in biological 
networks (Banerjee et al. 2009, Perkins and Langston 2009). 

Thus, it is abundantly clear that in some circumstances 
degree is an important metric; while in some others it might 
be betweenness or assortativity and so on. This naturally 
begs the question as to how one can identify which met- 
rics are important in a given scenario and which ones are 
redundant. 

In recent literature, an appropriate quantitative frame- 
work has been proposed to address this issue this by incor- 
porating multiple network metrics and higher moments of 
some of these (Filkov et al. 2009, Roy et al. 2009). These 
papers considered a significant number of metrics, includ- 
ing higher moments of metric distributions, wherever ap- 
propriate. Many distributions are often (albeit not always), 
quantified by their first few moments. For example, distribu- 
tions of metrics like degree, betweenness, geodesic or clus- 
tering might carry important information about the system 
and should be studied in depth whenever possible. Methods 
from data mining such as clustering and statistical dimen- 
sion reduction techniques like Principal Component Anal- 
ysis (PCA) (Jolliffe 2002) can then be utilized for the un- 
ambiguous identification of informative and redundant net- 
work metrics. The results obtained by this treatment clearly 
demonstrate that it is not just the degree or betweenness or 
some other metric which is important. Most of the meaning- 
ful information is actually carried by a linear combination 
of some metrics and/or the higher moments of a few metrics 
(Filkov et al. 2009, Roy et al. 2009). The essence of usage 
of these techniques is outlined below. 

A heatmap is a typical tool in clustering which is exten- 
sively used across the sciences. In a heatmap, the rows and 
columns are arranged so that, the most correlated metrics are 
placed closest to each other due to the hierarchical cluster- 
ing used. Heatmaps allow us to identify clusters of similar 
network attributes by detecting blocks of squares along the 
diagonal. A limited amount of clustering along the diago- 
nal would imply that most of the network metrics chosen 
are effectively independent and could be informative for our 
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analysis. On the other hand, sizable blocks along the diago- 
nal would denote redundancy. 

Well known statistical dimension-reduction techniques, 
like Principal Components Analysis allow for a comprehen- 
sive comparison across many metrics in networks. The es- 
sential idea at the heart of PCA is to ensure that when high 
dimensional data is projected to a lower dimension, the max- 
imum variance is retained. PCA enables the projection of 
an n - dimensional dataset onto an equi-dimensional space, 
such that the "new axes" (in other words, the principal com- 
ponents) are orthogonal. These principal components are ac- 
tually linear combinations of the original dimensional vari- 
ables, such that the first d axes, where d < n retains the max- 
imal variance of the original data set. 

The power of these methods lies in the fact that they can 
also be used for comparing network growth models among 
themselves and how individual models fare with respect to 
real world networks. (Filkov et al. 2009). 

One might wonder if the consideration of the first few 
moments of a distribution is a mere book keeping exercise. 
That they are indeed informative is reflected by the emphatic 
presence of a number of higher moments of metric distribu- 
tions in the first few principal components of analyzed data 
and/or models (Filkov et al. 2009, Agarwal et al. 2010, Vilar 
et al. 2010, Bounova et al. 2012). Again, metabolic networks 
apparently bear strong signatures of organism phenotypes in 
some of the higher moments of their network metrics (Roy 
et al. 2009). 

In conclusion, the above discussion hopefully establishes 
the importance of the fact that it is only proper that topolog- 
ical analysis and modeling of networks in systems biology 
and complex networks should venture beyond the treatment 
of betweenness, degree, hubs, scale-free networks etc. and 
instead focus on multiple metrics in networks. 
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